81 research outputs found
Tied factor analysis for face recognition across large pose differences
Face recognition algorithms perform very unreliably when the pose of the probe face is different from the gallery face: typical feature vectors vary more with pose than with identity. We propose a generative model that creates a one-to-many mapping from an idealized “identity” space to the observed data space. In identity space, the representation for each individual does not vary with pose. We model the measured feature vector as being generated by a pose-contingent linear transformation of the identity variable in the presence of Gaussian noise. We term this model “tied” factor analysis. The choice of linear transformation (factors) depends on the pose, but the loadings are constant (tied) for a given individual. We use the EM algorithm to estimate the linear transformations and the noise parameters from training data.
We propose a probabilistic distance metric that allows a full posterior over possible matches to be established. We introduce a novel feature extraction process and investigate recognition performance by using the FERET, XM2VTS, and PIE databases. Recognition performance compares favorably with contemporary approaches
Meta-MeTTa: an operational semantics for MeTTa
We present an operational semantics for the language MeTTa
ImageSpirit: Verbal Guided Image Parsing
Humans describe images in terms of nouns and adjectives while algorithms
operate on images represented as sets of pixels. Bridging this gap between how
humans would like to access images versus their typical representation is the
goal of image parsing, which involves assigning object and attribute labels to
pixel. In this paper we propose treating nouns as object labels and adjectives
as visual attribute labels. This allows us to formulate the image parsing
problem as one of jointly estimating per-pixel object and attribute labels from
a set of training images. We propose an efficient (interactive time) solution.
Using the extracted labels as handles, our system empowers a user to verbally
refine the results. This enables hands-free parsing of an image into pixel-wise
object/attribute labels that correspond to human semantics. Verbally selecting
objects of interests enables a novel and natural interaction modality that can
possibly be used to interact with new generation devices (e.g. smart phones,
Google Glass, living room devices). We demonstrate our system on a large number
of real-world images with varying complexity. To help understand the tradeoffs
compared to traditional mouse based interactions, results are reported for both
a large scale quantitative evaluation and a user study.Comment: http://mmcheng.net/imagespirit
Efficient salient region detection with soft image abstraction
Detecting visually salient regions in images is one of the fundamental problems in computer vision. We propose a novel method to decompose an image into large scale per-ceptually homogeneous elements for efficient salient region detection, using a soft image abstraction representation. By considering both appearance similarity and spatial dis-tribution of image pixels, the proposed representation ab-stracts out unnecessary image details, allowing the assign-ment of comparable saliency values across similar regions, and producing perceptually accurate salient region detec-tion. We evaluate our salient region detection approach on the largest publicly available dataset with pixel accurate annotations. The experimental results show that the pro-posed method outperforms 18 alternate methods, reducing the mean absolute error by 25.2 % compared to the previous best result, while being computationally more efficient. 1
- …